Search CORE

65 research outputs found

The Car That Hit The Burning House: Understanding Small Scale Incident Related Information in Microblogs

Author: Ristoski Petar
Schulz Axel
Publication venue: AAAI Press
Publication date: 01/01/2013
Field of study

Microblogs are increasingly gaining attention as an important information source in emergency management. In this case, state-of-the-art has shown that many valuable situational information is shared by citizens and official sources. However, current approaches focus on information shared during large scale incidents, with high amount of publicly available information. In contrast, in this paper, we conduct two studies on every day small scale incidents. First, we propose the first machine learning algorithm to detect three different types of small scale incidents with a precision of 82.2% and 82% recall. Second, we manually classify users contributing situational information about small scale incidents and show that a variety of individual users publish incident related information. Furthermore, we show that those users are reporting faster than official sources

TUbiblio

MAnnheim DOCument Server

Association for the Advancement of Artificial Intelligence: AAAI Publications

What is special about Bethlehem, Pennsylvania? Identifying unexpected facts about DBpedia entities

Author: Paulheim Heiko
Ristoski Petar
Schäfer Benjamin
Publication venue: RWTH
Publication date: 01/01/2015
Field of study

MAnnheim DOCument Server

Exploiting semantic web knowledge graphs in data mining

Author: Ristoski Petar
Publication venue
Publication date: 01/01/2018
Field of study

Data Mining and Knowledge Discovery in Databases (KDD) is a research field concerned with deriving higher-level insights from data. The tasks performed in that field are knowledge intensive and can often benefit from using additional knowledge from various sources. Therefore, many approaches have been proposed in this area that combine Semantic Web data with the data mining and knowledge discovery process. Semantic Web knowledge graphs are a backbone of many information systems that require access to structured knowledge. Such knowledge graphs contain factual knowledge about real word entities and the relations between them, which can be utilized in various natural language processing, information retrieval, and any data mining applications. Following the principles of the Semantic Web, Semantic Web knowledge graphs are publicly available as Linked Open Data. Linked Open Data is an open, interlinked collection of datasets in machine-interpretable form, covering most of the real world domains. In this thesis, we investigate the hypothesis if Semantic Web knowledge graphs can be exploited as background knowledge in different steps of the knowledge discovery process, and different data mining tasks. More precisely, we aim to show that Semantic Web knowledge graphs can be utilized for generating valuable data mining features that can be used in various data mining tasks. Identifying, collecting and integrating useful background knowledge for a given data mining application can be a tedious and time consuming task. Furthermore, most data mining tools require features in propositional form, i.e., binary, nominal or numerical features associated with an instance, while Linked Open Data sources are usually graphs by nature. Therefore, in Part I, we evaluate unsupervised feature generation strategies from types and relations in knowledge graphs, which are used in different data mining tasks, i.e., classification, regression, and outlier detection. As the number of generated features grows rapidly with the number of instances in the dataset, we provide a strategy for feature selection in hierarchical feature space, in order to select only the most informative and most representative features for a given dataset. Furthermore, we provide an end-to-end tool for mining the Web of Linked Data, which provides functionalities for each step of the knowledge discovery process, i.e., linking local data to a Semantic Web knowledge graph, integrating features from multiple knowledge graphs, feature generation and selection, and building machine learning models. However, we show that such feature generation strategies often lead to high dimensional feature vectors even after dimensionality reduction, and also, the reusability of such feature vectors across different datasets is limited. In Part II, we propose an approach that circumvents the shortcomings introduced with the approaches in Part I. More precisely, we develop an approach that is able to embed complete Semantic Web knowledge graphs in a low dimensional feature space, where each entity and relation in the knowledge graph is represented as a numerical vector. Projecting such latent representations of entities into a lower dimensional feature space shows that semantically similar entities appear closer to each other. We use several Semantic Web knowledge graphs to show that such latent representation of entities have high relevance for different data mining tasks. Furthermore, we show that such features can be easily reused for different datasets and different tasks. In Part III, we describe a list of applications that exploit Semantic Web knowledge graphs, besides the standard data mining tasks, like classification and regression. We show that the approaches developed in Part I and Part II can be used in applications in various domains. More precisely, we show that Semantic Web graphs can be exploited for analyzing statistics, building recommender systems, entity and document modeling, and taxonomy induction. %In Part III, we focus on semantic annotations in HTML pages, which are another realization of the Semantic Web vision. Semantic annotations are integrated into the code of HTML pages using markup languages, like Microformats, RDFa, and Microdata. While such data covers various domains and topics, and can be useful for developing various data mining applications, additional steps of cleaning and integrating the data need to be performed. In this thesis, we describe a set of approaches for processing long literals and images extracted from semantic annotations in HTML pages. We showcase the approaches in the e-commerce domain. Such approaches contribute in building and consuming Semantic Web knowledge graphs

MAnnheim DOCument Server

CERN Document Server

A System for Suggestion and Execution of Semantically : Annotated Actions based on Service Composition

Author: Jovanovik Milos
Ristoski Petar
Trajanov Dimitar
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

MAnnheim DOCument Server

A Hybrid Multi-strategy Recommender System Using Linked Open Data

Author: Loza Mencía Eneldo
Paulheim Heiko
Ristoski Petar
Publication venue: Springer Internat. Publ.
Publication date: 01/01/2014
Field of study

In this paper, we discuss the development of a hybrid multi-strategy book recommendation system using Linked Open Data. Our approach builds on training individual base recommenders and using global popularity scores as generic recommenders. The results of the individual recommenders are combined using stacking regression and rank aggregation. We show that this approach delivers very good results in different recommendation settings and also allows for incorporating diversity of recommendations

TUbiblio

Crossref

MAnnheim DOCument Server

JDeveloper 11g R2 Jena Adapter Extension

Author: Efremov Marjan
Ristoski Petar
Trajanov Dimitar
Zdraveski Vladimir
Publication venue: Univ. "Ss. Cyril and Methodius", Fac. of Computer Science and Engineering
Publication date: 01/01/2012
Field of study

MAnnheim DOCument Server

Semantic Stored Procedures Programming Environment and Performance Analysis

Author: Efremov Marjan
Ristoski Petar
Trajanov Dimitar
Zdraveski Vladimir
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

Crossref

MAnnheim DOCument Server

Event-based clustering for reducing labeling costs of incident-related microposts

Author: Fürnkranz Johannes
Janssen Frederik
Ristoski Petar
Schulz Axel
Publication venue: RWTH
Publication date: 01/01/2015
Field of study

TUbiblio

MAnnheim DOCument Server

Event-based clustering for reducing labeling costs of event-related microposts

Author: Fürnkranz Johannes
Janssen Frederik
Ristoski Petar
Schulz Axel
Publication venue: AAAI Press
Publication date: 01/01/2015
Field of study

Automatically identifying the event type of event-related information in the sheer amount of social media data makes machine learning inevitable. However, this is highly dependent on (1) the number of correctly labeled instances and (2) labeling costs. Active learning has been proposed to reduce the number of instances to label. Albeit the thematic dimension is already used, other metadata such as spatial and temporal information that is helpful for achieving a more fine-grained clustering is currently not taken into account. In this paper, we present a novel event-based clustering strategy that makes use of temporal, spatial, and thematic metadata to determine instances to label. An evaluation on incident-related tweets shows that our selection strategy for active learning outperforms current state-of-the-art approaches even with few labeled instances

TUbiblio

MAnnheim DOCument Server

Association for the Advancement of Artificial Intelligence: AAAI Publications

GEval: A Modular and Extensible Evaluation Framework for Graph Embedding Techniques

Author: Altabba Abdulrahman
Cochez Michael
Garofalo Martina
Pellegrino Maria Angela
Ristoski Petar
Publication venue: Springer
Publication date: 01/01/2020
Field of study

VU Research Portal